STCOR-895 wait a loooong time for a "stale" rotation request #1548
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As part of the RTR lifecycle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage.
To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated.
Thus, waiting longer is a quick fix. A more detailed approach to tracking the request is detailed in the code-comments attached to #1547.
Refs STCOR-895